Multilevel/Hierarchical Models
Setup
For this tutorial, you will need to install the lme4 package. This package contains all the functions we will need to begin fitting multilevel models in R. We also use the DiagrammeR package for making diagrams.
# Install the lme4 package
install.packages('lme4')
install.packages("DiagrammeR")Introduction
So far in class we have covered a class of models that put a rather strict assumption on the structure of the data. That is, that the observations are independent of one another. In this tutorial and the one that follows we will cover a set of regression models where the data are structured in groups and coefficients can vary by group! In my own personal work, I have found many uses for these models as data that is hierarchicaclly structured arises naturally.
First some terminology. You will find these types of models referecenced under various terms, often used interchangeably. Here I list a few:
- Multilevel Models
- Hierarchical Models
- Mixed Effects models
- Random Effects models
We will start simply by giving some examples of the types of data that we will model with hierarchical models. We will then look at some random intercept models followed by some random slope models. All data can be found in the data/ folder of this github repository.
Example data
Multilevel models are used when we want to model how relationships vary across units (levels). Data from multilevel models are often (but not always) would as coming from a nested structure. For example”
- Students within classes within schools
- Students within project groups within a class
- Residents within apartment buildings within cities
- Patients treated by doctors
- Patients within treatment groups measured over time
When presented with this type of data, it is useful to think about the stucture and map out the different levels.
For example: We want to model student math scores from students in 25 different scools from 5 cities. We would map out these levels as:
- Level 1: the student
- Level 2: the classroom
- Level 3: the school
- Level 4: the city
We do this because we may believe that any effects of interest may vary by one or more of these groups. The effect of eating lunch before a math test may vary between classrooms, or schools or cities (or all of them).
Another example: patients treated by 10 doctors have their weight measured 4 times over the course of a month. We could map out this structure as:
- Level 1: individual weight measurements
- Level 2: the patient
- Level 3: the doctor
Here we have measurements nested within a patient nested within a doctor.
Pause here to try to think of another example multilevel model example.
Motivating data example
Here we motivate the models with the use of some sample data from the Cherry Blossom Running Race. This data comes from the github repo of Modern Data Science with R
library(dplyr)
library(gt)
cherry <- readr::read_csv('data/cherry-blossom-race.csv')
cherry %>%
head() %>%
gt() %>%
tab_header(title = "Cherry Blossom Running Race",
subtitle = "A sub-sample of outcomes for the annual Cherry Blossom Ten Mile race in Washington, D.C")| Cherry Blossom Running Race | |||||
|---|---|---|---|---|---|
| A sub-sample of outcomes for the annual Cherry Blossom Ten Mile race in Washington, D.C | |||||
| runner | age | net | gun | year | previous |
| 1 | 53 | 83.98333 | 87.03333 | 2002 | 0 |
| 1 | 54 | 74.30000 | 74.56667 | 2003 | 1 |
| 1 | 55 | 75.15000 | 75.26667 | 2004 | 2 |
| 1 | 56 | 74.21667 | 74.40000 | 2005 | 3 |
| 1 | 57 | 74.25000 | 74.33333 | 2006 | 4 |
| 1 | 58 | NA | 74.83333 | 2007 | 5 |
The data set contains the following variables:
runner: a unique identifier for the runnerage: age of the runnernet: time to complete the race, from starting line to finish line (minutes)gun: time between the official start of the of race and the finish line (minutes)
The outcome for this data will be the variable net: the time for each runner between the start line and finish line. Let’s explore the data a little.
# How many runners
n_distinct(cherry$runner)## [1] 36
# How many races for each runner
cherry %>%
group_by(runner) %>%
summarize(`number of races` = n(),
`age at first race` = min(age),
`age at last race` = max(age),
`mean race time` = round(mean(net, na.rm = T), 2),
`sd race time` = round(sd(net, na.rm = T), 1),
`missing values in outcome` = sum(is.na(net))) %>%
gt() %>%
tab_header(title = "Summary statistics for each runner")| Summary statistics for each runner | ||||||
|---|---|---|---|---|---|---|
| runner | number of races | age at first race | age at last race | mean race time | sd race time | missing values in outcome |
| 1 | 7 | 53 | 59 | 76.38 | 4.3 | 2 |
| 2 | 7 | 52 | 59 | 83.97 | 3.9 | 2 |
| 3 | 7 | 51 | 57 | 89.94 | 4.1 | 1 |
| 4 | 7 | 52 | 61 | 103.05 | 4.4 | 2 |
| 5 | 7 | 51 | 58 | 76.45 | 5.9 | 1 |
| 6 | 7 | 54 | 63 | 77.40 | 3.4 | 3 |
| 7 | 7 | 51 | 58 | 101.62 | 4.6 | 2 |
| 8 | 7 | 52 | 61 | 104.37 | 4.9 | 3 |
| 9 | 7 | 54 | 62 | 100.79 | 8.2 | 2 |
| 10 | 7 | 52 | 61 | 117.61 | 9.3 | 2 |
| 11 | 7 | 52 | 58 | 89.45 | 8.5 | 2 |
| 12 | 7 | 51 | 59 | 71.77 | 5.4 | 2 |
| 13 | 7 | 53 | 59 | 82.22 | 5.7 | 2 |
| 14 | 7 | 53 | 59 | 96.71 | 3.9 | 0 |
| 15 | 7 | 52 | 59 | 96.06 | 4.5 | 2 |
| 16 | 7 | 52 | 60 | 93.90 | 3.5 | 2 |
| 17 | 7 | 50 | 58 | 100.06 | 17.5 | 2 |
| 18 | 7 | 50 | 59 | 101.60 | 4.2 | 3 |
| 19 | 7 | 52 | 61 | 83.15 | 3.0 | 3 |
| 20 | 7 | 50 | 58 | 108.03 | 8.8 | 2 |
| 21 | 7 | 52 | 58 | 86.10 | 4.1 | 1 |
| 22 | 7 | 50 | 58 | 87.94 | 4.1 | 2 |
| 23 | 7 | 51 | 58 | 103.95 | 4.5 | 1 |
| 24 | 7 | 54 | 60 | 106.15 | 5.1 | 1 |
| 25 | 7 | 51 | 59 | 101.48 | 6.2 | 2 |
| 26 | 7 | 50 | 58 | 85.17 | 2.3 | 2 |
| 27 | 7 | 50 | 57 | 98.48 | 6.9 | 1 |
| 28 | 7 | 50 | 57 | 75.21 | 2.1 | 2 |
| 29 | 7 | 54 | 60 | 63.06 | 1.0 | 1 |
| 30 | 7 | 53 | 60 | 66.55 | 3.9 | 1 |
| 31 | 7 | 52 | 60 | 82.55 | 4.0 | 2 |
| 32 | 7 | 50 | 58 | 78.66 | 2.9 | 2 |
| 33 | 7 | 54 | 63 | 85.64 | 3.2 | 3 |
| 34 | 7 | 52 | 60 | 75.82 | 4.7 | 2 |
| 35 | 7 | 52 | 59 | 86.97 | 2.6 | 1 |
| 36 | 7 | 51 | 60 | 107.49 | 5.7 | 3 |
cherry %>%
ggplot(aes(net)) +
geom_histogram() +
labs(x = "time to complete the race (minutes)")cherry %>%
ggplot(aes(factor(runner), net)) +
geom_boxplot() +
labs(x = 'Runner ID',
y = "time to complete the race (minutes)") There are 36 runners in the data and they each completed 7 races.
We can see that there is a lot of variability within and between runners. For example, runner 17 has a lot of variability in each of their runs, while runner 12 has very little variability. Meanwhile runner 12 runs consistently faster race times than runner 17.
When we plot age against running time, we see a very weak association between age and running times. We can run a linear regression model to attempt to measure this effect.
cherry %>%
ggplot(aes(age, net)) +
geom_point() +
labs(x = 'age in years',
y = "time to complete the race (minutes)") +
geom_smooth(method = "lm")lm_model <- lm(net ~ age, data = cherry)
sjPlot::tab_model(lm_model)| net | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 75.06 | 26.68 – 123.44 | 0.003 |
| age | 0.27 | -0.61 – 1.15 | 0.544 |
| Observations | 185 | ||
| R2 / R2 adjusted | 0.002 / -0.003 | ||
The age coefficient is small and is not statistically significant ( p = 0.544). Are we to conclude that age has no effect on running times?
Let’s consider the model we just fit. We will plot the fitted regression line for a few sample runners.
`
cherry %>%
filter(runner %in% 1:8) %>%
ggplot(aes(age, net)) +
geom_point() +
facet_wrap(~ runner, nrow=2) +
geom_abline(intercept = coef(lm_model)[1],
slope = coef(lm_model)[2]) +
labs(x = 'age in years',
y = "time to complete the race (minutes)") Above we selected runners 1 through 8, plotted their age against run times and added the fitted regression line from the linear regression model.
Does this line fit the data well for these 8 runners? For runner 3 perhaps, but I would argue it does not fit very well for the other 7 runners. For most of these runners we see a clear pattern for their running times increase as their age increases. They all have slightly different mean running times as well.
Below we plot a fitted linear regression line for each runner based on their 7 data points as well as the overall regression line.
cherry %>%
ggplot(aes(age, net, color = factor(runner))) +
geom_point() +
labs(x = 'age in years',
y = "time to complete the race (minutes)",
color = "Runner ID") +
geom_smooth(method = "lm", se=F)+
geom_abline(intercept = coef(lm_model)[1],
slope = coef(lm_model)[2], size = 2, color = "blue")## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).
Complete Pooling
The simple regression model we fit is called the complete pooling model. In this model, we pool all 252 running times (\(y_1, y_2, \ldots y_{252}\)) into one population pool.
We are making the following assumptions:
- each observation is independent of the others
- information about the individual runners is irrelevant to our model of running time vs age.
The first assumption is clearly false as we are measuring the same runners several times. Though the observations on one runner might be independent of those on another, the observations within a runner are correlated. The second assumption takes a little more thought, but it is also incorrect.
We’ve seen above that by making these assumptions, this model has been unable to determine that people tend to get slower as they age.
When we are presented with a nested data structure, the complete pooling model will violate the assumption of independence and we will often make misleading conclusions about any relationships we are investigating.
The No Pooling Model
We’ve seen what happens when we completely pool all of our observations. What is the opposite approach? The no pooling model considers each of our 36 runners separately
cherry %>%
ggplot(aes(age, net)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
facet_wrap(~ runner) ## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).
This model appears to fit the data quite well! The model(s) have picked up on the fact that the effect of age is different between runners and that the mean values vary between runners.
There are problems with this model however.
What if you choose to run in this race next year. How can we predict your race time given your age? You can’t estimate this from the no pooling model!
There is a second issue as well. Let’s look at runner 1 a little more closely.
cherry %>%
filter(runner == 1) %>%
ggplot(aes(age, net)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, fullrange = TRUE) ## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Based on the rest of the data we have, should we assume that runner 1 will continue to see decreased running times?
In the no pooled approach, we are only considering data from runner 1 when making this conclusion. We have data from 35 other runners, and we can make use of this data to know that runners typically slow down with age. Perhaps runner 1 will in fact have a faster race time over the next few years, but not as steep as the no pooled approach would suggest.
Let’s summarize these thoughts. With the no pooled approach
- We cannot generate predictions for new samples that fall outside the data we have modeled
- The no pooled approach assumes that information from one runner contains no relevant information for any of the other runners (this is true in general outside the running example)
Multilevel model (partial pooling)
Multilevel models provide a sort of in between of the complete pooling and no pooling approaches, known as partial pooling.
Multilevel models assume that although each group is unique, all groups are connected (having been sampled from some population) and might contain valuable information about one another.
The graph below shows a multilevel data structure in general. For your reference I have included the DiagrammeR code I wrote to display this structure. DiagrammeR is an incredibly useful tool for drawing complex diagrams.
As a short tutorial:
- you wrap all your code in quote with a call to
grViz. - create your graph with a digraph statement. I have named my graph
multilevel_model - use a
graphstatement to give overall attributes (font, size, etc) to your entire graph - use a
nodestatement to create nodes (give attributes inside brackets)- after a node statement you can name as many nodes as you want (separated by a semicolon)
- When you give a node a name (e.g Group1), you can also give it a label inside square brackets. Here I am using html to give subscripts. For example,
[label=<Group<SUB>1</SUB>>]will label my node \(Group_1\).
- add edges between nodes with edge statements. For example
Group1->y_11will draw a line between the node namedGroup1and the node namesy_11
See the docs for more information.
library(DiagrammeR)## Warning: package 'DiagrammeR' was built under R version 4.1.2
grViz("
digraph multilevel_model {
# a 'graph' statement
graph [overlap = true, fontsize = 10]
# several 'node' statements
node [shape = box,
fontname = Helvetica]
Population
node [shape = box,
fontname = Helvetica]
Group1[label=<Group<SUB>1</SUB>>];
Group2[label=<Group<SUB>2</SUB>>];
group_dot[label=<...>];
Groupm[label=<Group<SUB>m</SUB>>]
node [shape = circle,
fixedsize = true,
width = 0.9] // sets as circles
y_11[label=<y<SUB>11</SUB>>];
y_21[label=<y<SUB>21</SUB>>];
y_31[label=<y<SUB>31</SUB>>];
y_12[label=<y<SUB>12</SUB>>];
y_22[label=<y<SUB>22</SUB>>];
y_32[label=<y<SUB>32</SUB>>];
y_dot[label=<...>];
y_1m[label=<y<SUB>1m</SUB>>];
y_2m[label=<y<SUB>2m</SUB>>];
y_3m[label=<y<SUB>3m</SUB>>]
# several 'edge' statements
Population->Group1 Population->Group2 Population ->group_dot Population->Groupm
Group1->y_11 Group1->y_21 Group1->y_31
Group2->y_12 Group2->y_22 Group2->y_32
group_dot -> y_dot
Groupm->y_1m Groupm->y_2m Groupm->y_3m
}
")We will go into details shortly, but below I will fit a random intercept model to this data, and then plot the regression lines for the complete pool model, the no pool model and the partial pool model for 4 example runners and then a hypothetical new runner (you).
# first the no pooling model
no_pool_model <- cherry %>%
group_by(runner) %>%
do(fit_age = broom::tidy(lm(net ~ age, data = .))) %>%
unnest(fit_age) %>%
select(runner, term, estimate) %>%
pivot_wider(names_from = term, values_from = estimate) %>%
rename(intercept = `(Intercept)`, slope = age)
# look at the first few observations of this object.
no_pool_model %>%
head %>%
gt() %>%
tab_header(title = "No pooling model",
subtitle = "First few runner coefficients")| No pooling model | ||
|---|---|---|
| First few runner coefficients | ||
| runner | intercept | slope |
| 1 | 183.90500 | -1.95500000 |
| 2 | 20.43000 | 1.17666667 |
| 3 | 86.49683 | 0.06428571 |
| 4 | 40.31261 | 1.13648649 |
| 5 | -11.22422 | 1.62867495 |
| 6 | 15.64622 | 1.06022222 |
#####################################
# PARTIAL POOLING RANDOM INTERCEPT MODEL
library(lme4)## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
random_int_model <- lmer(net ~ age + (1 | runner), data = cherry)
random_intercepts <- ranef(random_int_model)
fixed_effects <- fixef(random_int_model)
partial_pooling <- tibble(
runner = 1:36,
partial_pool_intercept = fixed_effects[1] + random_intercepts$runner$`(Intercept)`,
partial_pool_slope = fixed_effects[2]
)
# plot for runners 4 runners
cherry %>%
filter(runner %in% c(1, 20, 22, 25)) %>%
left_join(no_pool_model, by = "runner") %>%
left_join(partial_pooling, by = "runner") %>%
ggplot(aes(age, net)) +
geom_point() +
geom_abline(intercept = coef(lm_model)[1],
slope = coef(lm_model)[2], color = "blue") +
geom_abline(aes(intercept = intercept, slope = slope,
color = "No Pool")) +
geom_abline(aes(intercept = partial_pool_intercept, slope = partial_pool_slope,
color = "Partial Pool")) +
facet_wrap(~ runner) +
coord_cartesian(ylim = c(70, 120))## Warning: Removed 8 rows containing missing values (geom_point).
# We can estimate a new runner's time as well
new_runner <- tibble(runner = 37, age = 50:70)
predictions <- predict(random_int_model, new_runner, allow.new.levels = T)
new_runner %>%
mutate(predictions = predictions) %>%
ggplot(aes(age, predictions)) +
geom_point() +
geom_line() +
geom_abline(intercept = coef(lm_model)[1],
slope = coef(lm_model)[2], color = "blue")+
coord_cartesian(ylim = c(70, 120)) +
ggtitle("Estimated running time for a new runner")Notice the use of allow.new.levels = T in the prediction. The blue line is the complete pooling estiate while the black line is the estimate from our multilevel model.
Our partial pooling model doesn’t ignore the population from which our runners were sampled. We can use our complete pooling or partial pooling (hierarchical model) to generate predictions for a new runner. The results are quite different between the two models. The complete pooling model hasnt picked up on the fact that runners get slower as they age, while the multilevel model has. The rate at which you decline is equivalent to the average decline across the 36 runners from the sample.
Random intercept
We often think of multilevel linear models as an extension of the linear model:
\[y_i = \alpha + \beta x_i + \epsilon_i\] where we allow for varying intercepts
\[y_i = \alpha_{j[i]} + \beta x_i + \epsilon_i\] and further extend to allow for varying slopes by group \[y_i = \alpha_{j[i]} + \beta_{j[i]} x_i + \epsilon_i\]
Notation
- The smallest item of measurement is called a unit and is index as \(i = 1\ldots n\).
- The outcome measurement \(y = (y_1, \ldots y_n)\) is at the unit level.
- Predictors are represented by an \(n\times k\) matrix \(X\) so that the vector of predicted values \(\hat{y} = X\beta\).
- For each unit \(i\), we denote the row vector \(X_i\) so that \(\hat{y_i} = X\beta\) is the prediction for unit \(i\)
- We labels groups as \(j = 1,\ldots J\). This works for single level grouping (e.g. students in schools, people over time, patients within doctors, etc)
- If we need a second level of grouping we will use \(k = 1,\ldots, K\).
- Index variables \(j[i]\) code group membership. If \(j[14]=3\) then the 12the unit in the data \(i=12\) belongs to group 3.
We specify the varying-intercept model as
\[y_i \sim N(\alpha_{j[i]} + \beta x_i, \sigma_y^2)\]
or as
\[y_i = \alpha_{j[i]} + \beta x_i + \epsilon_i\]
With multiple predictors we write the model as:
\[y_i \sim N(X_iB, \sigma_y^2)\]
Where \(B\) is a matrix of coefficients.
- Standard deviation is \(\sigma_y\) for data level errors, and \(\sigma_{\alpha\) \(\sigma_{\beta}\) for group level errors.
- Group level predictors are represented by a matrix \(U\) with \(J\) rows. For example, in the group level model \(\alpha_j \sim N(U_j\gamma, \sigma_{\alpha}^2)\). If we have a single group level predictor, we label it as lowercase \(u\).
Complete pooling with no predictors
Remember that multilevel regression can be thought of as a method for compromising between the two extremes of excluding a categorical predictor from a model (complete pooling), or estimating separate models within each level of the categorical predictor (no pooling).
We will use the radon.csv dataset which contains Radon measurements of 919 owner-occupied homes in 85 counties in Minnesota.
The data has the following variables
log.radon: Radon measurement (in log pCi/L, i.e., log picoCurie per liter)basement: Indicator for the level of the home at which the radon measurement was taken - 0 = basement, 1 = first floor.uranium: Average county-level soil uranium content.county: county IDcount.name: county name as a factor
radon <- readr::read_csv('data/radon.csv')Let’s plot the complete pooling, no pooling and multilevel model partial pooling for this data. Here we are interested in the log radon measurements per county
# for extracting the standard errors
source('helpers.R')
# the complete pooling estimates
mean_log_radon <- mean(radon$log.radon)
# no pooling estimates
count_means <- radon %>%
group_by(county) %>%
summarize(mean_radon = mean(log.radon),
county_var = var(log.radon),
sample_size = n()) %>%
ungroup() %>%
mutate(county_sds = mean(sqrt(county_var[!is.na(county_var)]))/sqrt(sample_size))
# plot of mean no pooling and complete pooling estimates
p1 <- count_means %>%
ggplot(aes(sample_size , mean_radon)) +
geom_point() +
geom_errorbar(aes(ymin = mean_radon - county_sds,
ymax = mean_radon + county_sds)) +
geom_hline(yintercept =mean_log_radon, color = 'blue', size = 1) +
ggtitle("No pooling and complete pooling estimates") +
coord_cartesian(ylim = c(0, 3.5))
# make a multilevel model to get the partial pooling estiates
random_int_model <- lmer(log.radon ~ 1 + (1| county), data = radon)
# extract the standard errors
se <- random_int_model %>%
ranefSE()%>%
rename(intercept=`(Intercept)`,se=`se.(Intercept)`)
# make the partial pooling plot
p2 <- count_means %>%
mutate(model_intercept = coef(random_int_model)$county$`(Intercept)`) %>%
bind_cols(se) %>%
ggplot(aes(sample_size , model_intercept)) +
geom_point() +
geom_errorbar(aes(ymin = model_intercept - se,
ymax = model_intercept + se)) +
geom_hline(yintercept =mean_log_radon, color = 'blue', size = 1) +
ggtitle("Partial pooling estimates")+
coord_cartesian(ylim = c(0, 3.5))
cowplot::plot_grid(p1, p2)Complete pooling ignores variation between counties while the no-pooling analysis overstates it. Notice how counties with fewer observations shrink towards the population mean while data with more observations tend to shrink less. This intuitively makes sense. The more data we have, the more certain we are of the estimates.
The goal of estimation is the average log radon level \(\alpha_j\) for each county \(j\). For each county we have a sample size \(n_j\) .
In this simple case we can get the multilevel estimate for a given county \(j\) as a weighted average of the mean of the observations in the county:
\[\alpha_{j}^{\text{multilevel}} \approx \frac{\frac{n_j}{\sigma_y^2}\bar{y}_j + \frac{1}{\sigma_{\alpha}^2}\bar{y}_{all} }{\frac{n_j}{\sigma_y^2} + \frac{1}{\sigma_{\alpha}^2} }\]
Adding predictors
We will add the predictor basement (0 = basement, 1 = first floor). We can generate estimates for our complete pooling, no pooling, and partial pooling as follows.
complete_pool <- lm(log.radon ~ basement, data = radon)
summary(complete_pool)##
## Call:
## lm(formula = log.radon ~ basement, data = radon)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6293 -0.5383 0.0342 0.5603 2.5486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.32674 0.02972 44.640 <2e-16 ***
## basement -0.61339 0.07284 -8.421 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8226 on 917 degrees of freedom
## Multiple R-squared: 0.07178, Adjusted R-squared: 0.07077
## F-statistic: 70.91 on 1 and 917 DF, p-value: < 2.2e-16
coef(complete_pool)[2]## basement
## -0.6133948
no_pool <- lm(log.radon ~ basement + factor(county) - 1, data = radon)
coef(no_pool)[1]## basement
## -0.720539
partial_pool <- lmer(log.radon ~ basement + (1 | county), data = radon)
fixef(partial_pool)[2]## basement
## -0.6929937
I leave it as an exercise to make a similar plot as for our first model (for a few sampled counties), but including our basement predictor.
In general, we see the same problems here as we did in the first example using the runner data.
Specifying a multilevel model
With this radon data, the simplest multilevel model is specified as:
\[y_i \sim N(\alpha_{j[i]} + \beta x_i, \sigma_y^2), \text{ for } i = 1, \ldots, n\]
This looks similar to the no-pooling model, but the difference is that the \(\alpha_j\) are set to the least squares estimate in the no pooling model.
In the multilevel model we put a soft constraint on the \(\alpha_j\)’s. They are assigned a probability distribution
\[ \alpha_j \sim N(\mu_{\alpha}, \sigma_{\alpha}^2), \text{ for } j = 1, \ldots, J\]
The mean \(\mu_{\alpha}\) and standard deviation \(\sigma_{\alpha}\) get estimated using the data. This constraint has the effect of pulling the estimates \(\alpha_j\) towards the population mean \(\mu_{\alpha}\).
Let’s look at these estimates from our fitted model. We can use VarCorr() to extract the standard deviations from our y values and the intercept. We can extract fixed effect with fixef()
#
multievel_model <- lmer(log.radon ~ basement + (1 | county), data = radon)
sigmas <- VarCorr(multievel_model)
print(sigmas)## Groups Name Std.Dev.
## county (Intercept) 0.32822
## Residual 0.75559
fixed_effects <- fixef(multievel_model)
print(fixed_effects)## (Intercept) basement
## 1.4615979 -0.6929937
So with this model we have,
- \(\sigma_{\alpha}\) = 0.328
- \(\mu_{\alpha}\) = 1.46
- \(\sigma_{\y}\) = 0.756
- \(\beta\) = -0.693
More on the lmer function
We will be using lmer to fit linear multilevel models and glmer to fit generalized linear multilevel models.
This function works similar to the lm and glm functions we have used in the past. The exception is that when specifying random effects (both intercepts and slopes). Below we fit a model with a random intercept and extract components from the model.
# random intercept model
random_intercept <- lmer(log.radon ~ basement + (1 | county), data = radon)
# model summary
summary(random_intercept)## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + (1 | county)
## Data: radon
##
## REML criterion at convergence: 2171.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3989 -0.6155 0.0029 0.6405 3.4281
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.1077 0.3282
## Residual 0.5709 0.7556
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46160 0.05158 28.339
## basement -0.69299 0.07043 -9.839
##
## Correlation of Fixed Effects:
## (Intr)
## basement -0.288
# estimated coefficients
model_coef <- coef(random_intercept)
model_coef$county %>% head()## (Intercept) basement
## 1 1.1915003 -0.6929937
## 2 0.9276468 -0.6929937
## 3 1.4792143 -0.6929937
## 4 1.5045012 -0.6929937
## 5 1.4461503 -0.6929937
## 6 1.4801817 -0.6929937
# fixed effects
fixef(random_intercept)## (Intercept) basement
## 1.4615979 -0.6929937
# random effects
random_effects <- ranef(random_intercept)
random_effects$county %>% head()## (Intercept)
## 1 -0.27009754
## 2 -0.53395109
## 3 0.01761646
## 4 0.04290332
## 5 -0.01544759
## 6 0.01858386
# standard errors fixed effects
se.fixef(random_intercept)## (Intercept) basement
## 0.05157623 0.07043081
# standard errors random effects
random_effects <- se.ranef(random_intercept)
random_effects$county %>% head()## (Intercept)
## 1 0.24777409
## 2 0.09981833
## 3 0.26227671
## 4 0.21544775
## 5 0.24777409
## 6 0.26227671
Good news is that other packages we have used like SjPlot play nicely with lmer objects
sjPlot::tab_model(random_intercept)## Registered S3 methods overwritten by 'parameters':
## method from
## as.double.parameters_kurtosis datawizard
## as.double.parameters_skewness datawizard
## as.double.parameters_smoothness datawizard
## as.numeric.parameters_kurtosis datawizard
## as.numeric.parameters_skewness datawizard
## as.numeric.parameters_smoothness datawizard
## print.parameters_distribution datawizard
## print.parameters_kurtosis datawizard
## print.parameters_skewness datawizard
## summary.parameters_kurtosis datawizard
## summary.parameters_skewness datawizard
| log radon | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 1.46 | 1.36 – 1.56 | <0.001 |
| basement | -0.69 | -0.83 – -0.55 | <0.001 |
| Random Effects | |||
| σ2 | 0.57 | ||
| τ00 county | 0.11 | ||
| ICC | 0.16 | ||
| N county | 85 | ||
| Observations | 919 | ||
| Marginal R2 / Conditional R2 | 0.090 / 0.234 | ||
Random slope model
The next step in multilevel modeling is to allow more than one regression coefficient to vary by group.
We can extend the model we have been working with so far as follows:
\[y_i \sim N(\alpha_{j[i]} + \beta_{j[i]} x_i, \sigma_y^2)\]
\[ \begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} = N\Big( \begin{pmatrix} \mu_\alpha \\ \mu_\beta \end{pmatrix}, \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\ \sigma_{\alpha}^2 & \sigma_{\beta}^2 \end{pmatrix}\Big), \text{ for } j = 1, \ldots, J \]
random_slope <- lmer(log.radon ~ basement + (1 + basement |county), data = radon)
summary(random_slope)## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + (1 + basement | county)
## Data: radon
##
## REML criterion at convergence: 2168.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4044 -0.6224 0.0138 0.6123 3.5682
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## county (Intercept) 0.1216 0.3487
## basement 0.1181 0.3436 -0.34
## Residual 0.5567 0.7462
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46277 0.05387 27.155
## basement -0.68110 0.08758 -7.777
##
## Correlation of Fixed Effects:
## (Intr)
## basement -0.381
model_coef <- coef(random_slope)
model_coef$county %>% head()## (Intercept) basement
## 1 1.1445374 -0.5406207
## 2 0.9333795 -0.7708213
## 3 1.4716909 -0.6688831
## 4 1.5354352 -0.7525548
## 5 1.4270378 -0.6206707
## 6 1.4826560 -0.6877094
# fixed effects
fixef(random_slope)## (Intercept) basement
## 1.4627700 -0.6810982
# random effects
random_effects <- ranef(random_slope)
random_effects$county %>% head()## (Intercept) basement
## 1 -0.318232574 0.140477472
## 2 -0.529390508 -0.089723141
## 3 0.008920884 0.012215066
## 4 0.072665215 -0.071456546
## 5 -0.035732220 0.060427482
## 6 0.019886022 -0.006611238
# standard errors fixed effects
se.fixef(random_slope)## (Intercept) basement
## 0.05386725 0.08758298
# standard errors random effects
random_effects <- se.ranef(random_slope)
random_effects$county %>% head()## (Intercept) basement
## 1 0.2645615 0.3185976
## 2 0.1011331 0.2650752
## 3 0.2990128 0.3158159
## 4 0.2544841 0.2907492
## 5 0.2645615 0.3185976
## 6 0.2710295 0.3357723
We often want to plot these random intercepts for each level of our grouping variable. The sjPlot package comes with a handy function for this
# random effects plot
sjPlot::plot_model(random_slope, type = "re")## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.4.0
## Current Matrix version is 1.3.4
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
Adding group level predictors
One of the powerful extensions of multilevel models is the ability to add predictors to the \(\mu_\alpha\) term(s). Here we briefly describe this model and show how to fit it. We will get into more details in the next lecture.
This model is specified as
\[ \begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} = N\Big( \begin{pmatrix} \gamma_0^\alpha + \gamma_1^\alpha\mu_j \\ \gamma_0^\beta + \gamma_1^\beta\mu_j \end{pmatrix}, \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\ \sigma_{\alpha}^2 & \sigma_{\beta}^2 \end{pmatrix}\Big), \text{ for } j = 1, \ldots, J \]
Here we will add in the county level soil uranium measurements:
random_slope_full <- lmer(log.radon ~ basement + uranium + (1 |county),
data = radon)
summary(random_slope_full)## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + uranium + (1 | county)
## Data: radon
##
## REML criterion at convergence: 2134.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.9673 -0.6117 0.0274 0.6555 3.3848
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.02446 0.1564
## Residual 0.57523 0.7584
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46576 0.03794 38.633
## basement -0.66824 0.06880 -9.713
## uranium 0.72027 0.09176 7.849
##
## Correlation of Fixed Effects:
## (Intr) basmnt
## basement -0.357
## uranium 0.145 -0.009
model_coef <- coef(random_slope_full)
model_coef$county %>% head()## (Intercept) basement uranium
## 1 1.445120 -0.6682448 0.7202676
## 2 1.477009 -0.6682448 0.7202676
## 3 1.478185 -0.6682448 0.7202676
## 4 1.576891 -0.6682448 0.7202676
## 5 1.473999 -0.6682448 0.7202676
## 6 1.439566 -0.6682448 0.7202676
# fixed effects
fixef(random_slope_full)## (Intercept) basement uranium
## 1.4657628 -0.6682448 0.7202676
# random effects
random_effects <- ranef(random_slope_full)
random_effects$county %>% head()## (Intercept)
## 1 -0.020642363
## 2 0.011245769
## 3 0.012422028
## 4 0.111128434
## 5 0.008235846
## 6 -0.026196357
# standard errors fixed effects
se.fixef(random_slope_full)## (Intercept) basement uranium
## 0.03794064 0.06880055 0.09176259
# standard errors random effects
random_effects <- se.ranef(random_slope_full)
random_effects$county %>% head()## (Intercept)
## 1 0.14458779
## 2 0.08727756
## 3 0.14728903
## 4 0.13729671
## 5 0.14458779
## 6 0.14728903
In this model, \(\gamma_0^\alpha\), \(\gamma_0^\beta\), \(\gamma_1^\alpha\), are given by:
interceptbasementuranium
Below we plot the alpha estimates plus or minus se against the uranium measurements.This shows how uranium is predictive of the intercept for each county!
# Get the county level uranium measurements
uranium <- radon %>%
group_by(county) %>%
slice(1) %>%
ungroup() %>%
select(county, uranium)
# get the fixed effect
fixed_effects <- fixef(random_slope_full)
# get the estimated alpha
alpha_hat<- fixef(random_slope_full)[1] +
fixef(random_slope_full)[3]*uranium$uranium +
as.vector(ranef(random_slope_full)$county) %>%
pull(`(Intercept)`)
# get the standard errors of the alpha
alpha_se <- se.ranef(random_slope_full)$county %>%
as.data.frame() %>%
pull( `(Intercept)`)
# make a data frame with these values
random_effects <- tibble(county = 1:85,
uranium = uranium$uranium,
alpha_hat = alpha_hat,
alpha_se = alpha_se)
#plot the estimated alpha against the uranium measurements
random_effects %>%
ggplot(aes(uranium, alpha_hat)) +
geom_point() +
geom_errorbar(aes(ymin= alpha_hat -alpha_se,
ymax = alpha_hat + alpha_se)) +
geom_abline(intercept = fixed_effects[1], slope = fixed_effects[3])